1 Indroducción



En el presente documento se pretende ir comentando los pasos que se están siguiendo para conseguir predecir la cantidad de agua que llegará a la presa de belesar. Para ello se cuenta con la predicción meteorológica que diariamente se realiza con WRF.

2 Datos



2.1 Histórico DHI



Para la elaboracion del modelo predictivo se cuenta con un Histórico de datos 15 minutales aportados por la presa de Belesar en la que se aportan datos como Aportación, Nivel del embalse, Caudal turbinado, lluvia y algunas otras variables.



DHI<- readRDS(here::here('Data/Parques/Belesar/Historico/Historico_DHI_Belesar.RDS'))

head(DHI)
##                  DATE TEMPERATURA (ºC) LLUVIA ACUMULADA DÍA (l/m2)
## 3 2018-01-01 00:00:00             6.12                       16.83
## 4 2018-01-01 00:15:00             5.44                        0.00
## 5 2018-01-01 00:30:00             5.81                        0.00
## 6 2018-01-01 00:45:00             6.11                        0.00
## 7 2018-01-01 01:00:00             6.22                        0.00
## 8 2018-01-01 01:15:00             5.97                        0.00
##   NIVEL EMBALSE (msnm) APORTACION (m3/s) CAUDAL TURBINADO (m3/s)
## 3               297.75            114.93                       0
## 4               297.76            114.48                       0
## 5               297.77            116.16                       0
## 6               297.78            116.18                       0
## 7               297.78             32.40                       0
## 8               297.79            116.23                       0
##   Q. TURB. BCE. (m3/s)
## 3                17.60
## 4                17.03
## 5                16.88
## 6                16.88
## 7                16.93
## 8                16.90



2.1.1 Outliers



Uno de los problemas que nos encontramos a la hora de tratar los datos es la presencia de valores atípicos o “outliers”. modo de ejemplo podemos observar el siguiente gráfico de la aportación… donde se aprecia un valor negativo enorme y erróneo sin ninguna duda.



plot(DHI$`APORTACION (m3/s)`,
     xlab = "Date",
     ylab= "Aportacion [m³/s]")



Investigando sobre la mejor manera de eliminar outliers encontre una función hecha por un fanático. que funciona de la siguiente manera y la verdad que me gusta.



outlierKD <- function(dt, var) {
  var_name <- eval(substitute(var),eval(dt))
  tot <- sum(!is.na(var_name))
  na1 <- sum(is.na(var_name))
  m1 <- mean(var_name, na.rm = T)
  par(mar = rep(2, 4))
  par(mfrow=c(2, 2), oma=c(0,0,3,0))
  boxplot(var_name, main="With outliers")
  hist(var_name, main="With outliers", xlab=NA, ylab=NA)
  outlier <- boxplot.stats(var_name)$out
  mo <- mean(outlier)
  var_name <- ifelse(var_name %in% outlier, NA, var_name)
  boxplot(var_name, main="Without outliers")
  hist(var_name, main="Without outliers", xlab=NA, ylab=NA)
  title("Outlier Check", outer=TRUE)
  na2 <- sum(is.na(var_name))
  message("Outliers identified: ", na2 - na1, " from ", tot, " observations")
  message("Proportion (%) of outliers: ", (na2 - na1) / tot*100)
  message("Mean of the outliers: ", mo)
  m2 <- mean(var_name, na.rm = T)
  message("Mean without removing outliers: ", m1)
  message("Mean if we remove outliers: ", m2)
  
    dt[as.character(substitute(var))] <- invisible(var_name)
    assign(as.character(as.list(match.call())$dt), dt, envir = .GlobalEnv)
    message("Outliers successfully removed", "\n")
    return(invisible(dt))
   
}



Eliminamos valores negativos y pasamos el filtro para outliers.



DHI$`APORTACION (m3/s)`[which(DHI$`APORTACION (m3/s)`<0)]<- NA 

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 826 from 34552 observations
## Proportion (%) of outliers: 2.39059967585089
## Mean of the outliers: 1075.81016949153
## Mean without removing outliers: 131.482568013429
## Mean if we remove outliers: 108.354577773824
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 317 from 33726 observations
## Proportion (%) of outliers: 0.939927652256419
## Mean of the outliers: 410.450378548896
## Mean without removing outliers: 108.354577773824
## Mean if we remove outliers: 105.488153491574
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 228 from 33409 observations
## Proportion (%) of outliers: 0.682450836600916
## Mean of the outliers: 391.157280701754
## Mean without removing outliers: 105.488153491574
## Mean if we remove outliers: 103.525205991381
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 159 from 33181 observations
## Proportion (%) of outliers: 0.479189897833097
## Mean of the outliers: 380.017798742138
## Mean without removing outliers: 103.525205991381
## Mean if we remove outliers: 102.193901944158
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 168 from 33022 observations
## Proportion (%) of outliers: 0.5087517412634
## Mean of the outliers: 373.650892857143
## Mean without removing outliers: 102.193901944158
## Mean if we remove outliers: 100.805797771961
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 50 from 32854 observations
## Proportion (%) of outliers: 0.152188470201498
## Mean of the outliers: 365.8664
## Mean without removing outliers: 100.805797771961
## Mean if we remove outliers: 100.40179124497
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 4 from 32804 observations
## Proportion (%) of outliers: 0.0121936349225704
## Mean of the outliers: 362.2675
## Mean without removing outliers: 100.40179124497
## Mean if we remove outliers: 100.369856402439
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 1 from 32800 observations
## Proportion (%) of outliers: 0.00304878048780488
## Mean of the outliers: 361.84
## Mean without removing outliers: 100.369856402439
## Mean if we remove outliers: 100.361884508674
## Outliers successfully removed

outlierKD(DHI,`APORTACION (m3/s)`)
## Outliers identified: 0 from 32799 observations
## Proportion (%) of outliers: 0
## Mean of the outliers: NaN
## Mean without removing outliers: 100.361884508674
## Mean if we remove outliers: 100.361884508674
## Outliers successfully removed



plot(DHI$`APORTACION (m3/s)`,
     xlab = "Date",
     ylab= "Aportacion [m³/s]",
     type = "l")



Aplicando esta funcion a los datos de aportación vemos como se han eliminado valores… pero observando los datos… nos damos cuenta de que nuestros datos siguen teniendo demasiado “ruido”. Antes de proseguir hay que rellenar los huecos que hemos ido generando.



2.1.2 Rellenando huecos



Tras hacer la limpieza de los datos, se han generado gran cantidad de NA’s dentro de nuestro dataset.



sum(is.na(DHI$`APORTACION (m3/s)`))
## [1] 7180



A continuación añadimos datos interpolando. Con el paquete imputeTS se puede añadir los datos que faltan interpolando de maneras diferentes:

  • linear (por defecto)
  • spline
  • stine



library(imputeTS)

#Rellenamos Huecos
DHI$`APORTACION (m3/s)`<-  na.interpolation(DHI$`APORTACION (m3/s)`)


#Comprobamos
sum(is.na(DHI$`APORTACION (m3/s)`))
## [1] 0



2.1.3 SMA sobre los datos 15-minutales



Para eliminarlo aplicamos una moving average. Tenemos en cuenta de que tenemos un total de 96 datos diarios. probamos la Moving Average para varios valores.



library(TTR)

for(n in c(1,seq(48,192,length.out = 4))){
  Mavg<- SMA(DHI$`APORTACION (m3/s)`, n) 
  
  plot(DHI$DATE, Mavg,
       xlab = "Date",
       ylab= "Aportacion [m³/s]", 
       type = "l",
       main = paste0("Aportación en Belesar. MA= ", n))
}



Se puede apreciar como de este mode se elimina bastante ruido de la señal de aportación. Pero no sé hasta que punto estamos respetando los datos… Para continuar se decide que utilizaremos una Moving Average con 96 puntos(cantidad de datos generados en 1 dia)… que sería algo parecido a una media diaria (aunque no es lo mismo).



2.1.4 Limpiando datos de nivel del embalse



Tras ejecutar la funcion en buscar de outliers vemos que no hay que en este dataset hay menos datos erroneos.



outlierKD(DHI,`NIVEL EMBALSE (msnm)`)
## Outliers identified: 1 from 39978 observations
## Proportion (%) of outliers: 0.00250137575666617
## Mean of the outliers: 0
## Mean without removing outliers: 310.775323928161
## Mean if we remove outliers: 310.783097781224
## Outliers successfully removed

DHI$`NIVEL EMBALSE (msnm)`<-  na.interpolation(DHI$`NIVEL EMBALSE (msnm)`)



2.1.5 Correlación



A continuación comprobaremos la correlacion que existe entre la aportacion y la variacion del nivel de la presa de Belesar.



x<- 96
aportacion_SMA<- SMA(DHI$`APORTACION (m3/s)`,x)
nivel_SMA<- SMA(c(0,diff(DHI$`NIVEL EMBALSE (msnm)`)*8000+150), x)

aportacion_SMA<- aportacion_SMA[!is.na(aportacion_SMA)]
nivel_SMA<- nivel_SMA[!is.na(nivel_SMA)]


plot(aportacion_SMA,type = "l",
     xlab =" " ,ylab = "",xaxt= 'n',yaxt='n',
     main=paste0("Variacion del nivel y aportacion\n Correlación de: ", 
                 round(cor(aportacion_SMA, nivel_SMA), 3)))
lines(nivel_SMA, col = "red")



2.1.6 Cross-Correlation



Otro análisis previo que se puede hacer es el de la crosscorrelation… para observar con que desfase se obtiene la mejor correlation. Esto es super útil para analizar correlacione de variables que estan desfasadas en el tiempo.



ccf_belesar<- ccf(aportacion_SMA, nivel_SMA, lag.max = 5000)



#Máxima correlación
max(ccf_belesar$acf)
## [1] 0.5892262
#Con cuanto desfase se produce la máxima correlación. 
ccf_belesar$lag[which.max(ccf_belesar$acf)]
## [1] 34



De esta manera podemos darnos cuenta de que estas variables obtienen la mejor correlacion (0.58) para un retraso en la aportación de 34 valores, teniendo en cuenta que nuestro dataset es de datos 15 minutales esto supone 9 horas, es decir, los datos de aportación influyen en el nivel de la presa con un retardo de medio día.



2.1.7 Obtención de medias horarias



A continuación construiremos una tabla con las variables aportacion y nivel obtenidas por diferentes métodos:



  • Haciendo SMA sobre el dataset completo (Hecho anteriormente)
  • Haciendo medias horarias
  • Haciendo SMA sobre las medias horarias



#Retrasamos hacia detrás todos los datos de lluvia porque la acumulada de toda la hora la suma en la hora siguiente....
DHI$`LLUVIA ACUMULADA DÍA (l/m2)`<- lead(DHI$`LLUVIA ACUMULADA DÍA (l/m2)`)

Desacumular_lluvia_2018<- DHI[which(year(DHI$DATE)==2018),] %>% group_by(yday(DATE)) %>% mutate(desacumulada= c(diff(`LLUVIA ACUMULADA DÍA (l/m2)`),0),
                                                                         lluvia=ifelse(desacumulada>=0, desacumulada, 0))


Desacumular_lluvia_2019<- DHI[which(year(DHI$DATE)==2019),] %>% group_by(yday(DATE)) %>% mutate(desacumulada= c(diff(`LLUVIA ACUMULADA DÍA (l/m2)`),0),
                                                                              lluvia=ifelse(desacumulada>=0, desacumulada, 0))
Desacumular_lluvia<- rbind(Desacumular_lluvia_2018, Desacumular_lluvia_2019)


Medias_horarias_2018<- as.data.frame(Desacumular_lluvia[year(Desacumular_lluvia$DATE)==2018,]) %>% group_by(yday(DATE), hour(DATE))  %>% 
  summarize(., Acum_horaria=sum(lluvia, na.rm = T),
            aport_mean=mean(`APORTACION (m3/s)`, na.rm = T),
            nivel_mean=mean(`NIVEL EMBALSE (msnm)`, na.rm = T))

Medias_horarias_2019<- as.data.frame(Desacumular_lluvia[year(Desacumular_lluvia$DATE)==2019,]) %>% group_by(yday(DATE),hour(DATE))  %>% 
  summarize(., Acum_horaria=sum(lluvia, na.rm = T),
            aport_mean=mean(`APORTACION (m3/s)`, na.rm = T),
            nivel_mean=mean(`NIVEL EMBALSE (msnm)`, na.rm = T))

Medias_horarias<- rbind(Medias_horarias_2018, Medias_horarias_2019)
vector_Date<- seq(range(DHI$DATE)[1],range(DHI$DATE)[2],
                  by="hour")

Lluvia_acum_horaria<- as.data.frame(cbind(as.character(vector_Date[2:length(vector_Date)]), 
                                          Medias_horarias[,3:5]))



Lluvia_acum_horaria$`as.character(vector_Date[2:length(vector_Date)])`<- ymd_hms(Lluvia_acum_horaria$`as.character(vector_Date[2:length(vector_Date)])`)


colnames(Lluvia_acum_horaria)<- c("Date", "Lluvia_mm", "aport_mean", "nivel_mean")



2.1.7.1 Comprobación mejor SMA sobre la media horaria



Comprobacoin de máxima correlacion desplazada para diferentes valores de SMA.



for (n in seq(6,24*3, by=6)) {
  y<- n
  aportacion_mean_SMA<- SMA(Lluvia_acum_horaria$aport_mean,y)
  nivel_mean_SMA<- SMA(Lluvia_acum_horaria$nivel_mean,y)
  
  
  
  Tabla_DHI<- as.data.frame(cbind(Lluvia_acum_horaria, 
                                  
                                  aportacion_mean_SMA,
                                  nivel_mean_SMA))
  
  
  
  colnames(Tabla_DHI)<- c(names(Lluvia_acum_horaria), "aport_mean_SMA",  "nivel_mean_SMA")
  Tabla_DHI<- Tabla_DHI[complete.cases(Tabla_DHI),]
  
  
  
  ccf_aport_mean<- ccf(Tabla_DHI$aport_mean_SMA,
                       Tabla_DHI$nivel_mean_SMA, 
                       lag.max = 5000, 
                       plot = T)
  
  print(paste0("Máxima correlacion de ",
               round(max(ccf_aport_mean$acf), digits = 3) ,
               " .Para un desfase de: ",
               round(ccf_aport_mean$lag[which.max(ccf_aport_mean$acf)]/24, digits = 2), 
               " días. Con un SMA de :", 
               n, 
               " horas"))
  
}

## [1] "Máxima correlacion de 0.609 .Para un desfase de: -76.79 días. Con un SMA de :6 horas"

## [1] "Máxima correlacion de 0.616 .Para un desfase de: -76.58 días. Con un SMA de :12 horas"

## [1] "Máxima correlacion de 0.621 .Para un desfase de: -76.42 días. Con un SMA de :18 horas"

## [1] "Máxima correlacion de 0.624 .Para un desfase de: -76.38 días. Con un SMA de :24 horas"

## [1] "Máxima correlacion de 0.627 .Para un desfase de: -76.29 días. Con un SMA de :30 horas"

## [1] "Máxima correlacion de 0.629 .Para un desfase de: -76.25 días. Con un SMA de :36 horas"

## [1] "Máxima correlacion de 0.632 .Para un desfase de: -76.21 días. Con un SMA de :42 horas"

## [1] "Máxima correlacion de 0.634 .Para un desfase de: -76.08 días. Con un SMA de :48 horas"

## [1] "Máxima correlacion de 0.636 .Para un desfase de: -75.92 días. Con un SMA de :54 horas"

## [1] "Máxima correlacion de 0.638 .Para un desfase de: -75.88 días. Con un SMA de :60 horas"

## [1] "Máxima correlacion de 0.64 .Para un desfase de: -75.79 días. Con un SMA de :66 horas"

## [1] "Máxima correlacion de 0.642 .Para un desfase de: -75.67 días. Con un SMA de :72 horas"



Se puede observar como casi todos los casos probando diferentes SMA’s llegan a la conclusión de que la correlación máxima se tiene para 3 días y poco entre aportación y nivel… Se logra mejorar la correlacion aumentando el SMA. La mejor correlación aparece para un desfase de entorno a 75 días. Lo cual no tiene mucho sentido. o por lo menos a nosotros no nos vale…



2.1.8 Creamos Dataset final



Añadimos ahora los SMA’s de todo el dataset y el SMA sobre la media horaria.



#se selecciona 96 porque es la cantidad de datos que corresponden a 1 día. 
x<- 96
aportacion_SMA<- SMA(DHI$`APORTACION (m3/s)`,x)
nivel_SMA<- SMA(DHI$`NIVEL EMBALSE (msnm)`, x)

aportacion_horaria<- aportacion_SMA[DHI$DATE%in%vector_Date]
nivel_horario<- nivel_SMA[DHI$DATE%in%vector_Date]

#Se ponen 24 horas para que equivalgan a 1 día 
y<- 24
aportacion_mean_SMA<- SMA(Lluvia_acum_horaria$aport_mean,y)
nivel_mean_SMA<- SMA(Lluvia_acum_horaria$nivel_mean,y)





Tabla_DHI<- as.data.frame(cbind(Lluvia_acum_horaria, 
                                aportacion_horaria[2:length(aportacion_horaria)],
                                nivel_horario[2:length(nivel_horario)], 
                                aportacion_mean_SMA,
                                nivel_mean_SMA))



colnames(Tabla_DHI)<- c(names(Lluvia_acum_horaria), "aport_SMA", "nivel_SMA", 
                        "aport_mean_SMA",  "nivel_mean_SMA")

Tabla_DHI<- Tabla_DHI[complete.cases(Tabla_DHI),]



2.1.9 Ploteos de aportacion y nivel.



A continuación representamos la aportacion y el nivel. para los tres casos



2.1.9.1 SMA sobre media horaria



 k<- max(Tabla_DHI$nivel_mean_SMA)/max(Tabla_DHI$aport_mean_SMA)
 
  ggplot(data=Tabla_DHI, aes(x=Date))+
    geom_line(aes(y=aport_mean_SMA))+
    xlab("Date")+ylab("Aportación [m³/s]")+
    theme(panel.background = element_blank(),
          panel.grid = element_blank())+
    geom_line(aes(y=nivel_mean_SMA/k),group=1, color="red") +
    scale_y_continuous(sec.axis = sec_axis(trans = ~ . / (1/k), name = "Nivel [msnm]", 
                                           breaks = seq(min(Tabla_DHI$nivel_mean_SMA),
                                                        max(Tabla_DHI$nivel_mean_SMA),
                                                        by=10)),
                       breaks = seq(min(Tabla_DHI$aport_mean_SMA),
                                    max(Tabla_DHI$aport_mean_SMA),
                                    by=20)) + 
    ggtitle("Nivel y aportación. SMA (24 horas) de la media horaria")

ccf_DHI<- ccf(Tabla_DHI$aport_mean_SMA, 
    Tabla_DHI$nivel_mean_SMA, 
    lag.max = 1000)

 print(paste0("Máxima correlacion de ",
               round(max(ccf_DHI$acf), digits = 3) ,
               " .Para un desfase de: ",
               round(ccf_DHI$lag[which.max(ccf_DHI$acf)]/24, digits = 2), 
        " días"))
## [1] "Máxima correlacion de 0.583 .Para un desfase de: -41.67 días"



2.1.9.2 SMA sobre datos 15-minutales



 k<- max(Tabla_DHI$nivel_SMA)/max(Tabla_DHI$aport_SMA)
 
  ggplot(data=Tabla_DHI, aes(x=Date))+
    geom_line(aes(y=aport_SMA))+
    xlab("Date")+ylab("Aportación [m³/s]")+
    theme(panel.background = element_blank(),
          panel.grid = element_blank())+
    geom_line(aes(y=nivel_SMA/k),group=1, color="red") +
    scale_y_continuous(sec.axis = sec_axis(trans = ~ . / (1/k), name = "Nivel [msnm]", 
                                           breaks = seq(min(Tabla_DHI$nivel_SMA),
                                                        max(Tabla_DHI$nivel_SMA),
                                                        by=10)),
                       breaks = seq(min(Tabla_DHI$aport_SMA),
                                    max(Tabla_DHI$aport_SMA),
                                    by=20)) + 
    ggtitle("Nivel y aportación. SMA (96 datos = 1 día) de los datos 15-minutales")

ccf_DHI<- ccf(Tabla_DHI$aport_SMA, 
    Tabla_DHI$nivel_SMA, 
    lag.max = 4000)

 print(paste0("Máxima correlacion de ",
               round(max(ccf_DHI$acf), digits = 3) ,
               " .Para un desfase de: ",
               round(ccf_DHI$lag[which.max(ccf_DHI$acf)]/24, digits = 2), 
        " días"))
## [1] "Máxima correlacion de 0.614 .Para un desfase de: -61.42 días"



2.1.9.3 Media horaria sin SMA



 k<- max(Tabla_DHI$nivel_mean)/max(Tabla_DHI$aport_mean)
 
  ggplot(data=Tabla_DHI, aes(x=Date))+
    geom_line(aes(y=aport_mean))+
    xlab("Date")+ylab("Aportación [m³/s]")+
    theme(panel.background = element_blank(),
          panel.grid = element_blank())+
    geom_line(aes(y=nivel_mean/k),group=1, color="red") +
    scale_y_continuous(sec.axis = sec_axis(trans = ~ . / (1/k), name = "Nivel [msnm]", 
                                           breaks = seq(min(Tabla_DHI$nivel_mean),
                                                        max(Tabla_DHI$nivel_mean),
                                                        by=10)),
                       breaks = seq(min(Tabla_DHI$aport_mean),
                                    max(Tabla_DHI$aport_mean),
                                    by=20)) + 
    ggtitle("Nivel y aportación. Medias horarias")

ccf_DHI<- ccf(Tabla_DHI$aport_mean, 
    Tabla_DHI$nivel_mean, 
    lag.max = 1000)

 print(paste0("Máxima correlacion de ",
               round(max(ccf_DHI$acf), digits = 3) ,
               " .Para un desfase de: ",
               round(ccf_DHI$lag[which.max(ccf_DHI$acf)]/24, digits = 2), 
        " días"))
## [1] "Máxima correlacion de 0.621 .Para un desfase de: -3.29 días"



Por lo anteriormente expuesto… se llega a la conclusión de que el mejor método para relacionar la aportación y el nivel es haciendo una moving average sobre las medias horarias… un SMA de 142 valores… entorno a 6 dias.



2.1.10 Correlación entre la lluvia y la aportación observada



Otra comprobacion que considero pertinente es ver si la lluvia y la aportacion correlan debidamente…



 k<- max(Tabla_DHI$Lluvia_mm)/max(Tabla_DHI$aport_mean_SMA)
 
  ggplot(data=Tabla_DHI, aes(x=Date))+
    geom_line(aes(y=aport_mean_SMA))+
    xlab("Date")+ylab("Aportación [m³/s]")+
    theme(panel.background = element_blank(),
          panel.grid = element_blank())+
    geom_line(aes(y=Lluvia_mm/k),group=1, color="red") +
    scale_y_continuous(sec.axis = sec_axis(trans = ~ . / (1/k), name = "Lluvia Horaria [l/m²]", 
                                           breaks = seq(min(Tabla_DHI$Lluvia_mm),
                                                        max(Tabla_DHI$Lluvia_mm),
                                                        by=10)),
                       breaks = seq(min(Tabla_DHI$aport_mean_SMA),
                                    max(Tabla_DHI$aport_mean_SMA),
                                    by=20)) + 
    ggtitle("Lluvia y aportacion")

ccf_DHI<- ccf(Tabla_DHI$aport_mean_SMA, 
    Tabla_DHI$Lluvia_mm, 
    lag.max = 1000)

 print(paste0("Máxima correlacion de ",
               round(max(ccf_DHI$acf), digits = 3) ,
               " .Para un desfase de: ",
               round(ccf_DHI$lag[which.max(ccf_DHI$acf)]/24, digits = 2), 
        " días"))
## [1] "Máxima correlacion de 0.109 .Para un desfase de: 1.71 días"



Tabla_DHI_cut<- Tabla_DHI[which(month(Tabla_DHI$Date)==12), ]
 k<- max(Tabla_DHI_cut$Lluvia_mm)/max(Tabla_DHI_cut$aport_mean_SMA)
 
  ggplot(data=Tabla_DHI_cut, aes(x=Date))+
    geom_line(aes(y=aport_mean_SMA))+
    xlab("Date")+ylab("Aportación [m³/s]")+
    theme(panel.background = element_blank(),
          panel.grid = element_blank())+
    geom_line(aes(y=Lluvia_mm/k),group=1, color="red") +
    scale_y_continuous(sec.axis = sec_axis(trans = ~ . / (1/k), name = "Nivel [msnm]", 
                                           breaks = seq(min(Tabla_DHI_cut$Lluvia_mm),
                                                        max(Tabla_DHI_cut$Lluvia_mm),
                                                        by=10)),
                       breaks = seq(min(Tabla_DHI_cut$aport_mean_SMA),
                                    max(Tabla_DHI_cut$aport_mean_SMA),
                                    by=20)) + 
    ggtitle("Lluvia y aportacion. Diciembre")

ccf_DHI<- ccf(Tabla_DHI_cut$aport_mean_SMA, 
    Tabla_DHI_cut$Lluvia_mm, 
    lag.max = 1000)

 print(paste0("Máxima correlacion de ",
               round(max(ccf_DHI$acf), digits = 3) ,
               " .Para un desfase de: ",
               round(ccf_DHI$lag[which.max(ccf_DHI$acf)]/24, digits = 2), 
        " días"))
## [1] "Máxima correlacion de 0.065 .Para un desfase de: 17.04 días"



Tabla_DHI_cut<- Tabla_DHI[which(month(Tabla_DHI$Date)==12 | year(Tabla_DHI$Date)==2019), ]
 k<- max(Tabla_DHI_cut$Lluvia_mm)/max(Tabla_DHI_cut$aport_mean_SMA)
 
  ggplot(data=Tabla_DHI_cut, aes(x=Date))+
    geom_line(aes(y=aport_mean_SMA))+
    xlab("Date")+ylab("Aportación [m³/s]")+
    theme(panel.background = element_blank(),
          panel.grid = element_blank())+
    geom_line(aes(y=Lluvia_mm/k),group=1, color="red") +
    scale_y_continuous(sec.axis = sec_axis(trans = ~ . / (1/k), name = "Nivel [msnm]", 
                                           breaks = seq(min(Tabla_DHI_cut$Lluvia_mm),
                                                        max(Tabla_DHI_cut$Lluvia_mm),
                                                        by=10)),
                       breaks = seq(min(Tabla_DHI_cut$aport_mean_SMA),
                                    max(Tabla_DHI_cut$aport_mean_SMA),
                                    by=20)) + 
    ggtitle("Lluvia y aportacion. Diciembre-Enero-Febrero")

ccf_DHI<- ccf(Tabla_DHI_cut$aport_mean_SMA, 
    Tabla_DHI_cut$Lluvia_mm, 
    lag.max = 1000)

 print(paste0("Máxima correlacion de ",
               round(max(ccf_DHI$acf), digits = 3) ,
               " .Para un desfase de: ",
               round(ccf_DHI$lag[which.max(ccf_DHI$acf)]/24, digits = 2), 
        " días"))
## [1] "Máxima correlacion de 0.332 .Para un desfase de: 1.71 días"

Se puede observar como la correlación entre la aportación y la lluvia es apriori muy mala… habrá que hacer trikiñuelas para poder salvar esto… En principio me parece normal que estas variables tengan mala correlación… la aportación del río si que dependerá e la lluvia… pero influyen muchos más factores… además. La lluvia registrada en belesar no tiene porqué ser la mejor para estimar la aportación del río que llega a belesar… puede estar lluviendo muchos kilometros hacia arriba en el valle…



Antes de proseguir guardamos este histórico que hemos construido.



path_dataDHI <- here::here('Data/Parques/Belesar/Historico/DHI_historico_afinado.RDS')

saveRDS(Tabla_DHI,path_dataDHI)



2.2 Datos WRF



Los datos del histórico de WRF están organizados en formato lista… hay para cada localización 2 dataframes;

  • D1: WRF predicción del mismo día
  • D2: WRF predicción de mañana



A continuación se presenta el loop y las funciones que se emplean para generar todo el histórico WRF de Belesar… Cabe mencionar que contamos con una carpeta con Archivos RDS ordenados por fechas para la localización de belesar.



Usamos las siguientes funciones.



lon_lat_df_ls<- function(parque_list){
  fecha<- names(parque_list)
  fecha[!str_detect(fecha, " ")]<- paste0(fecha[!str_detect(fecha, " ")], " 00:00:00")
  
  n_lon<- parque_list[[1]]$lon
  n_lat<- parque_list[[1]]$lat
  
  nombres<- paste0(n_lon,"__",n_lat)
  
  
  list_localizaciones<- list()
  for (localizaciones in 1:length(parque_list[[1]][,1]) ) {
    l<- lapply(parque_list, function(x) x[localizaciones,])
    df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=T))
    df<- as.data.frame(cbind(ymd_hms(fecha), df))
    colnames(df)<-c("fechas", colnames(parque_list[[1]]))
    list_localizaciones[[localizaciones]]<- df
  }
  
  names(list_localizaciones)<- nombres
  return(list_localizaciones)
  
  
}
uv_transformation<- function(tabla_comp){
  
  nombres<- colnames(tabla_comp)
  
  u10<- tabla_comp$U10_MEAN
  v10<- tabla_comp$V10_MEAN
  
  u10_max<- tabla_comp$U10_MAX
  v10_max<- tabla_comp$V10_MAX
  
  wind_abs <- sqrt(u10^2 + v10^2)
  wind_dir_rad <-  atan2(u10/wind_abs, v10/wind_abs) 
  wind_dir_deg1 <-  wind_dir_rad * 180/pi 
  wind_dir_deg2 <-  wind_dir_deg1+ 180 
  
  
  wind_abs_max <- sqrt(u10_max^2 + v10_max^2)
  wind_dir_rad_max <-  atan2(u10_max/wind_abs_max, v10_max/wind_abs_max) 
  wind_dir_deg1_max <-  wind_dir_rad_max * 180/pi 
  wind_dir_deg2_max <-  wind_dir_deg1_max+ 180 
  
  tabla_comp<- as.data.frame(cbind(tabla_comp,wind_abs,wind_dir_deg2,
                                   wind_abs_max,wind_dir_deg2_max))
  colnames(tabla_comp)<- c(nombres,"WS","WD","WS_MAX","WD_MAX")
  tabla_comp$U10_MAX<- NULL
  tabla_comp$V10_MAX<- NULL
  tabla_comp$U10_MEAN<- NULL
  tabla_comp$V10_MEAN<- NULL
  return(tabla_comp)
  
}
extract_rain_data<- function(Belesar_lolat_df){
  
  rain<- Belesar_lolat_df[,c(1,2,3,4,5)]
  rain$pre_acum<- rain$RAINC+rain$RAINNC
  rain$RAINC<- NULL
  rain$RAINNC<- NULL
  
  prep_hourly<- vector()
  for (i in 1:length(rain$pre_acum)) {
    if(i==1){prep_hourly[i]<- rain$pre_acum[i]}else{
      prep_hourly[i]<- rain$pre_acum[i]-rain$pre_acum[i-1]
    }
    
  }
  rain$prep_hourly<- prep_hourly 
  
  return(rain)
}
extract_rain_data2<- function(Belesar_lolat_df){
  
  rain<- Belesar_lolat_df[,c(1,2,3,4,5,6,10,17,20)]
  rain$pre_acum<- rain$RAINC+rain$RAINNC
  
  prep_hourly<- vector()
  for (i in 1:length(rain$pre_acum)) {
    if(i==1){prep_hourly[i]<- rain$pre_acum[i]}else{
      prep_hourly[i]<- rain$pre_acum[i]-rain$pre_acum[i-1]
    }
    
  }
  rain$prep_hourly<- prep_hourly 
  
  return(rain)
}
Return_periodo_Belesar<- function(){
  All_files_Belesar<- list.files(here::here('Data/Parques/Belesar/'), 
                                 recursive = F, full.names = T)
  RDS_Belesar<- All_files_Belesar[str_detect(All_files_Belesar, ".RDS")]
  
  RDS_Belesar1<- RDS_Belesar[!str_detect(RDS_Belesar, "E001")]
  
  
  Belesar_data<- readRDS(RDS_Belesar1[1])
  Belesar_lolat<- lon_lat_df_ls(Belesar_data)
  Belesar_lolat1<- lapply(Belesar_lolat, uv_transformation)
  Belesar_rain<- lapply(Belesar_lolat1, extract_rain_data)
  
  fecha_ini<- Belesar_rain$`-8.02328491210938__42.1343421936035`$fechas[1]
  
  Belesar_data<- readRDS(RDS_Belesar1[length(RDS_Belesar)])
  Belesar_lolat<- lon_lat_df_ls(Belesar_data)
  Belesar_lolat1<- lapply(Belesar_lolat, uv_transformation)
  Belesar_rain<- lapply(Belesar_lolat1, extract_rain_data)
  
  fecha_last<- Belesar_rain$`-8.02328491210938__42.1343421936035`$fechas[length(Belesar_rain$`-8.02328491210938__42.1343421936035`$fechas)]
  
  periodo_WRF<- seq(fecha_ini, fecha_last, by="hour")
  
  Tabla_WRF<- as.data.frame(matrix(ncol = 10, nrow = length(periodo_WRF)))
  colnames(Tabla_WRF)<- colnames(Belesar_rain[[1]])
  Tabla_WRF$fechas<- periodo_WRF
  return(Tabla_WRF)
}



#Listamos archivos dentro de la carpeta de Belesar
All_files_Belesar<- list.files(here::here('Data/Parques/Belesar/'),
                               recursive = F, full.names = T)


#Detectamos cuales son RDS
RDS_Belesar<- All_files_Belesar[str_detect(All_files_Belesar, ".RDS")]

#Eliminamos los RDS que no son de WRF
RDS_Belesar1<- RDS_Belesar[!str_detect(RDS_Belesar, "E001")]

#Construimos Lista para cada instante de tiempo
Lista_localizacion<- list() 
for (i in 1:length(RDS_Belesar1)) {
  Belesar_data<- readRDS(RDS_Belesar1[i])
  Belesar_lolat<- lon_lat_df_ls(Belesar_data)
  Belesar_lolat1<- lapply(Belesar_lolat, uv_transformation)
  Belesar_rain<- lapply(Belesar_lolat1, extract_rain_data2)
  Lista_localizacion[[i]]<- Belesar_rain
}

#Nombramos la lista
names_fechas<- sapply(RDS_Belesar1, function(x){
  r<- str_split(x, "/")
  return(str_remove(str_remove(r[[1]][length(r[[1]])], ".RDS"), "Belesar_"))
})
names(Lista_localizacion)<- names_fechas


#Creamos una lista por localización
Lista_localizacion2<- list()
for (i in 1:length(Lista_localizacion[[1]])) {
  Lista_localizacion2[[i]]<- lapply(Lista_localizacion, 
                               function(x) return(x[[i]]))
}

#Nombramos la lista
names(Lista_localizacion2)<- names(Lista_localizacion[[1]])

#Guardamos la lista
path_lista_total<- here::here('Data/Parques/Belesar/Historico/')
nombre_lista<- paste0(path_lista_total, 'Historico_WRF_Belesar_Variables.RDS')
saveRDS(Lista_localizacion2, nombre_lista)

#Cargamos lista
path_lista_total<- here::here('Data/Parques/Belesar/Historico/')
nombre_lista<- paste0(path_lista_total, 'Historico_WRF_Belesar_Variables.RDS')
Lista_total1<- readRDS(nombre_lista)

#Juntamos todos los Dataframes
Lista_total_MF<- lapply(Lista_total1, function(x) bind_rows(x))

#creamos dos data.frames... uno para D1 y otro para D2
Lista_d1_d2_loc<- list()
for (i in 1:length(Lista_total_MF)) {
  p<- Lista_total_MF[[i]]
  d1<- p[duplicated(p$fechas),]
  d1<-d1[!duplicated(d1$fechas),]
  d2<- p[!duplicated(p$fechas),]
  
  d2_qneed1<-d2[!(d2$fechas%in%d1$fechas),]
  
  
  d1_2<-bind_rows(d1,d2_qneed1)
  d1_2<-d1_2[order(d1_2$fechas),]
  
  d2<-d2[order(d2$fechas),]
  
  d1_2$pre_acum<- NULL
  d2$pre_acum<- NULL
  
  colnames(d1_2)<- c("Date", "LON", "LAT", "RAINC", "RAINNC","RAINSH", "T02_MEAN","PSFC","WS_MAX", "prep_hourly")
  
  colnames(d2)<-  c("Date", "LON", "LAT", "RAINC", "RAINNC","RAINSH", "T02_MEAN","PSFC","WS_MAX", "prep_hourly")
  
  lista_loc_d12<- list(d1_2,d2)
  names(lista_loc_d12)<- c("D1", "D2")
  
  Lista_d1_d2_loc[[i]]<- lista_loc_d12
}

#nombramos la lsta
names(Lista_d1_d2_loc)<- names(Lista_total_MF)

Tabla_periodo1<- Return_periodo_Belesar()
colnames(Tabla_periodo1)<- c("Date", "LON", "LAT", "RAINC", "RAINNC","RAINSH", "T02_MEAN","PSFC","WS_MAX", "prep_hourly")

Lista_d1_d2_loc2<- list()
for (j in 1:length(Lista_d1_d2_loc)) {
  prueba_list<- Lista_d1_d2_loc[[j]]
  lista_retorno<- list()
  for(i in 1:2){
    prueba<- prueba_list[[i]]
    Tabla_periodo<- Tabla_periodo1
    Tabla_periodo$LON[match(prueba$Date,Tabla_periodo$Date)] <- prueba$LON
    Tabla_periodo$LAT[match(prueba$Date,Tabla_periodo$Date)] <- prueba$LAT
    Tabla_periodo$RAINC[match(prueba$Date,Tabla_periodo$Date)] <- prueba$RAINC
    Tabla_periodo$RAINNC[match(prueba$Date,Tabla_periodo$Date)] <- prueba$RAINNC
    Tabla_periodo$RAINSH[match(prueba$Date,Tabla_periodo$Date)] <- prueba$RAINSH
    Tabla_periodo$T02_MEAN[match(prueba$Date,Tabla_periodo$Date)] <- prueba$T02_MEAN
    Tabla_periodo$PSFC[match(prueba$Date,Tabla_periodo$Date)] <- prueba$PSFC
    Tabla_periodo$WS_MAX[match(prueba$Date,Tabla_periodo$Date)] <- prueba$WS_MAX
    Tabla_periodo$prep_hourly[match(prueba$Date,Tabla_periodo$Date)] <- prueba$prep_hourly
    lista_retorno[[i]]<- Tabla_periodo
  }
  names(lista_retorno)<- c("D1", "D2")
  Lista_d1_d2_loc2[[j]]<- lista_retorno 
  
  
}
names(Lista_d1_d2_loc2)<- names(Lista_d1_d2_loc)

#Creamos lista con las variables afinadas
Lista_d1_d2_loc3<- lapply(Lista_d1_d2_loc2, function(x){
  x[[1]]$RAINC<- NULL
  x[[1]]$RAINNC<- NULL
  x[[1]]$RAINSH<- NULL
  x[[2]]$RAINC<- NULL
  x[[2]]$RAINNC<- NULL
  x[[2]]$RAINSH<- NULL
  
  return(x)})
  
names(Lista_d1_d2_loc3)<- names(Lista_d1_d2_loc2)

#Guardamos
path_hist_WRF<- here::here('Data/Parques/Belesar/Historico/Historico_WRF_Belesar_Variables_D1D2.RDS')
saveRDS(Lista_d1_d2_loc3,path_hist_WRF)



2.3 Juntando históricos



Juntamos a cada data.frame de WRF la tabla afinada de DHI.



#Usamos dplyr para juntar ambos datasets

Belesar_WRF<- readRDS(here::here('Data/Parques/Belesar/Historico/Historico_WRF_Belesar_Variables_D1D2.RDS'))
Belesar_DHI<- readRDS(here::here('Data/Parques/Belesar/Historico/DHI_historico_afinado.RDS'))

df2<- Belesar_DHI

Belesar_Merge<- list()
for (j in 1:length(Belesar_WRF)) {
  lista_retorno<- list()
  for(i in 1:2){
    df1<-  Belesar_WRF[[j]][[i]]
    Merge_table<- left_join(df1, df2, by=c("Date"))
    lista_retorno[[i]]<- Merge_table
  }
  names(lista_retorno)<- c("D1", "D2")
  Belesar_Merge[[j]]<- lista_retorno 
}
names(Belesar_Merge)<- names(Belesar_WRF)
#Belesar merge completecases
Belesar_Merge_cc<- list()
for (j in 1:length(Belesar_Merge)) {
    lista_retorno<- list()
  for(i in 1:2){
    df1<-  Belesar_Merge[[j]][[i]]
    df1$Acumulated<- NULL
    Table_fine<- df1[complete.cases(df1),]
    lista_retorno[[i]]<- Table_fine
  }
  names(lista_retorno)<- c("D1", "D2")
  Belesar_Merge_cc[[j]]<- lista_retorno 
}
names(Belesar_Merge_cc)<- names(Belesar_Merge)



#Guardamos
path_hist_WRF<- here::here('Data/Parques/Belesar/Historico/Hist_D1D2_DHI_MERGED.RDS')
saveRDS(Belesar_Merge_cc,path_hist_WRF)



3 Comprobacion de correlacion aplicando diferentes regresiones lineales



#importar
path_hist_WRF<- here::here('Data/Parques/Belesar/Historico/Hist_D1D2_DHI_MERGED.RDS')
clean_data<- readRDS(path_hist_WRF)


#cortar en entrenamiento y predicción
cut_train<- lapply(clean_data, function(x){
  y<- lapply(x, function(r){
    
    fecha_ini<- ymd("2018/10/01")
    fecha_end<- ymd("2019/02/01")
    Jan_data<- r[which(r$Date<fecha_end & r$Date>fecha_ini),]
    return(Jan_data)
    
  })
  return(y)
})
cut_predict<- lapply(clean_data, function(x){
  y<- lapply(x, function(r){
    
    fecha_ini<- ymd("2019/02/01")
    fecha_end<- ymd("2019/02/20")
    Jan_data<- r[which(r$Date<fecha_end & r$Date>fecha_ini),]
    return(Jan_data)
    
  })
  return(y)
})



lista_TL<- list()
for (i in 1:length(cut_train)) {
  lista_d1d2<- list()
  for (j in 1:2) {
    data_predict<- cut_train[[i]][[j]]
    data_predict2<- cut_predict[[i]][[j]]
    
    fit_1  <- lm(Lluvia_mm  ~ prep_hourly, data = data_predict)
    fit_2  <- lm(Lluvia_mm  ~ prep_hourly + T02_MEAN, data = data_predict)
    fit_3  <- lm(Lluvia_mm  ~ prep_hourly + PSFC, data = data_predict)
    fit_4  <- lm(Lluvia_mm  ~ prep_hourly + WS_MAX, data = data_predict)
    fit_5  <- lm(Lluvia_mm  ~ prep_hourly * T02_MEAN, data = data_predict)
    fit_6  <- lm(Lluvia_mm  ~ prep_hourly * PSFC, data = data_predict)
    fit_7  <- lm(Lluvia_mm  ~ prep_hourly * WS_MAX, data = data_predict)
    
    uncorrected<-data_predict2$prep_hourly
    prediction_rain<- predict(fit_1, data.frame(prep_hourly =data_predict2$prep_hourly))
    prediction_rain2<- predict(fit_2, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 T02_MEAN =data_predict2$T02_MEAN))
    prediction_rain3<- predict(fit_3, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 PSFC =data_predict2$PSFC))
    prediction_rain4<- predict(fit_4, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 WS_MAX =data_predict2$WS_MAX))
    prediction_rain5<- predict(fit_5, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 T02_MEAN =data_predict2$T02_MEAN))
    prediction_rain6<- predict(fit_6, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 PSFC =data_predict2$PSFC))
    prediction_rain7<- predict(fit_7, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 WS_MAX =data_predict2$WS_MAX))
    
    observed_rain<-data_predict2$Lluvia_mm 
    
    
    
    tabla_cor<-cbind(cor(uncorrected, observed_rain),
                     cor(prediction_rain, observed_rain),
                     cor(prediction_rain2, observed_rain),
                     cor(prediction_rain3, observed_rain),
                     cor(prediction_rain4, observed_rain),
                     cor(prediction_rain5, observed_rain),
                     cor(prediction_rain6, observed_rain),
                     cor(prediction_rain7, observed_rain))
    
    fit_1  <- svm(Lluvia_mm  ~ prep_hourly, data = data_predict)
    fit_2  <- svm(Lluvia_mm  ~ prep_hourly + T02_MEAN, data = data_predict)
    fit_3  <- svm(Lluvia_mm  ~ prep_hourly + PSFC, data = data_predict)
    fit_4  <- svm(Lluvia_mm  ~ prep_hourly + WS_MAX, data = data_predict)
    fit_5  <- svm(Lluvia_mm  ~ prep_hourly * T02_MEAN, data = data_predict)
    fit_6  <- svm(Lluvia_mm  ~ prep_hourly * PSFC, data = data_predict)
    fit_7  <- svm(Lluvia_mm  ~ prep_hourly * WS_MAX, data = data_predict)
    
    uncorrected<-data_predict2$prep_hourly
    prediction_rain<- predict(fit_1, data.frame(prep_hourly =data_predict2$prep_hourly))
    prediction_rain2<- predict(fit_2, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 T02_MEAN =data_predict2$T02_MEAN))
    prediction_rain3<- predict(fit_3, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 PSFC =data_predict2$PSFC))
    prediction_rain4<- predict(fit_4, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 WS_MAX =data_predict2$WS_MAX))
    prediction_rain5<- predict(fit_5, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 T02_MEAN =data_predict2$T02_MEAN))
    prediction_rain6<- predict(fit_6, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 PSFC =data_predict2$PSFC))
    prediction_rain7<- predict(fit_7, data.frame(prep_hourly =data_predict2$prep_hourly,
                                                 WS_MAX =data_predict2$WS_MAX))
    
    observed_rain<-data_predict2$Lluvia_mm 
    
    tabla_cor<-cbind(tabla_cor,
                     cor(prediction_rain, observed_rain),
                     cor(prediction_rain2, observed_rain),
                     cor(prediction_rain3, observed_rain),
                     cor(prediction_rain4, observed_rain),
                     cor(prediction_rain5, observed_rain),
                     cor(prediction_rain6, observed_rain),
                     cor(prediction_rain7, observed_rain))
    
    colnames(tabla_cor)<- c("uncorrected", "1","2", "3","4","5","6","7",
                            "svm1","svm2", "svm3","svm4","svm5","svm6","svm7")
    lista_d1d2[[j]]<- tabla_cor
  }
  
  names(lista_d1d2)<- c("D1", "D2")
  lista_TL[[i]]<- lista_d1d2
}


colnames(tabla_cor)<- c("uncorrected", "1","2", "3","4","5","6","7",
                        "svm1","svm2", "svm3","svm4","svm5","svm6","svm7")
names(lista_d1d2)<- c("D1", "D2")


Lista_TL_names<-lapply(lista_TL, function(x){
  y<- lapply(x, function(r) {
    r<- as.data.frame(r)
    names(r)<- c("uncorrected", "1","2", "3","4","5","6","7",
                 "svm1","svm2", "svm3","svm4","svm5","svm6","svm7")
    return(r)
  })
  names(y)<- c("D1", "D2")
  return(y)
})
names(Lista_TL_names)<- names(cut_predict)


x<- lapply(Lista_TL_names, function(x){bind_rows(as.data.frame(x))})
y<- bind_rows(x, .id="id")

cor_table<- as.data.frame(matrix(ncol = 3, nrow = 43))
colnames(cor_table)<- c("id","corr", "name")

for (i in 2:length(y[1,])) {
  f<- as.data.frame(cbind(y[which.max(y[,i]), c(1,i)], names( y[which.max(y[,i]), c(1,i)])[2]))
  colnames(f)<- c("id","corr", "name")
  cor_table<- rbind(cor_table,f )
}

cor_table<- cor_table[complete.cases(cor_table),]
table(cor_table$id)
## 
## -7.33221435546875__42.8309211730957 -7.33944702148438__42.9655685424805 
##                                   1                                   2 
## -7.36126708984375__43.3694076538086   -7.478271484375__42.1520957946777 
##                                   3                                   1 
##  -7.4857177734375__42.2868041992188 -7.50823974609375__42.6908493041992 
##                                   2                                   1 
##  -7.5386962890625__43.2293548583984     -7.54638671875__43.363941192627 
##                                   3                                   6 
## -7.65997314453125__42.1464500427246   -7.6756591796875__42.415828704834 
##                                   1                                   6 
## -7.69937133789062__42.8197975158691 -7.73147583007812__43.3581962585449 
##                                   1                                   1 
##  -7.9080810546875__43.2176132202148 
##                                   2
punto_Belesar<- c(42.628577,-7.713948)
extract_lat_lon<- lapply(str_split(cor_table$id,"__"), function(x){
  return(as.data.frame(cbind(x[[1]],x[[2]])))
})

extract_lat_lon<- as.data.frame(bind_rows(extract_lat_lon))
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
cor_table<- as.data.frame(cbind(extract_lat_lon, cor_table[,2:3]))

colnames(cor_table)<- c("LON","LAT", "Corr", "Method")



library(geosphere)
vec_dist<- vector()
for (i in 1:length(cor_table[,1])) {
  vec_dist[i]<- distm(c(-7.713948, 42.628577), c(as.numeric(cor_table[i,1]), 
                                                 as.numeric(cor_table[i,2])), 
                      fun = distHaversine)
  
}

vec_dist<- round(vec_dist/1000, digits = 1)

cor_table<- as.data.frame(cbind(cor_table, vec_dist))


cor_table
##                   LON              LAT      Corr         Method vec_dist
## 111  -7.6756591796875  42.415828704834 0.6819757 D1.uncorrected     23.9
## 112  -7.6756591796875  42.415828704834 0.6819757           D1.1     23.9
## 113  -7.6756591796875  42.415828704834 0.6820122           D1.2     23.9
## 44    -7.478271484375 42.1520957946777 0.6410577           D1.3     56.5
## 81   -7.4857177734375 42.2868041992188 0.6616118           D1.4     42.4
## 311 -7.33944702148438 42.9655685424805 0.6313029           D1.5     48.4
## 251 -7.69937133789062 42.8197975158691 0.6326919           D1.6     21.3
## 82   -7.4857177734375 42.2868041992188 0.6713669           D1.7     42.4
## 312 -7.33944702148438 42.9655685424805 0.6078017        D1.svm1     48.4
## 310 -7.65997314453125 42.1464500427246 0.6045748        D1.svm2     53.9
## 271 -7.33221435546875 42.8309211730957 0.6176615        D1.svm3     38.5
## 114  -7.6756591796875  42.415828704834 0.6423457        D1.svm4     23.9
## 221 -7.50823974609375 42.6908493041992 0.6022039        D1.svm5     18.2
## 115  -7.6756591796875  42.415828704834 0.6090482        D1.svm6     23.9
## 116  -7.6756591796875  42.415828704834 0.6496231        D1.svm7     23.9
## 381  -7.5386962890625 43.2293548583984 0.6070733 D2.uncorrected     68.4
## 382  -7.5386962890625 43.2293548583984 0.6070733           D2.1     68.4
## 383  -7.5386962890625 43.2293548583984 0.6069044           D2.2     68.4
## 421    -7.54638671875  43.363941192627 0.6413140           D2.3     83.0
## 422    -7.54638671875  43.363941192627 0.6217031           D2.4     83.0
## 423    -7.54638671875  43.363941192627 0.6109681           D2.5     83.0
## 431 -7.36126708984375 43.3694076538086 0.6452272           D2.6     87.3
## 424    -7.54638671875  43.363941192627 0.6192445           D2.7     83.0
## 361  -7.9080810546875 43.2176132202148 0.5649173        D2.svm1     67.5
## 411 -7.73147583007812 43.3581962585449 0.5711481        D2.svm2     81.2
## 432 -7.36126708984375 43.3694076538086 0.6271874        D2.svm3     87.3
## 425    -7.54638671875  43.363941192627 0.6902568        D2.svm4     83.0
## 362  -7.9080810546875 43.2176132202148 0.5670526        D2.svm5     67.5
## 433 -7.36126708984375 43.3694076538086 0.6068788        D2.svm6     87.3
## 426    -7.54638671875  43.363941192627 0.6877907        D2.svm7     83.0